Method and apparatus for generating 3d audio content from two-channel stereo content

ABSTRACT

For generating 3D audio content from a two-channel stereo signal, the stereo signal (x(t)) is partitioned into overlapping sample blocks and is transformed into time-frequency domain. From the stereo signal directional and ambient signal components are separated, wherein the estimated directions of the directional components are changed by a predetermined factor, wherein, if changes are within a predetermined interval, they are combined in order to form a directional centre channel object signal. For the other directions an encoding to Higher Order Ambisonics (HOA) is performed. Additional ambient signal channels are generated by de-correlation and rating by gain factors, followed by encoding to HOA. The directional HOA signals and the ambient HOA signals are combined, and the combined HOA signal and the centre channel object signals are transformed to time domain.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority European Patent Application No.15306544.6, filed on Sep. 30, 2015, which is incorporated herein byreference in its entirety.

TECHNICAL FIELD

The invention relates to a method and to an apparatus for generating 3Daudio scene or object based content from two-channel stereo basedcontent.

BACKGROUND

The invention is related to the creation of 3D audio scene/object basedaudio content from two-channel stereo channel based content. Somereferences related to up mixing two-channel stereo content to 2Dsurround channel based content include: [2] V. Pulkki, “Spatial soundreproduction with directional audio coding”, J. Audio Eng. Soc., vol.55, no. 6, pp. 503-516, June 2007; [3] C. Avendano, J. M. Jot, “Afrequency-domain approach to multichannel upmix”, J. Audio Eng. Soc.,vol. 52, no. 7/8, pp. 740-749, July/August 2004; [4] M. M. Goodwin, J.M. Jot, “Spatial audio scene coding”, in Proc. 125th Audio Eng. Soc.Conv., 2008, San Francisco, Calif.; [5] V. Pulkki, “Virtual sound sourcepositioning using vector base amplitude panning”, J. Audio Eng. Soc.,vol. 45, no. 6, pp. 456-466, June 1997; [6] J. Thompson, B. Smith, A.Warner, J. M. Jot, “Direct-diffuse decomposition of multichannel signalsusing a system of pair-wise correlations”, Proc. 133rd Audio Eng. Soc.Conv., 2012, San Francisco, Calif.; [7] C. Faller, “Multiple-loudspeakerplayback of stereo signals”, J. Audio Eng. Soc., vol. 54, no. 11, pp.1051-1064, November 2006; [8] M. Briand, D. Virette, N. Martin,“Parametric representation of multi-channel audio based on principalcomponent analysis”, Proc. 120th Audio Eng. Soc. Conv, 2006, Paris; [9]A. Walther, C. Faller, “Direct-ambient decomposition and upmix ofsurround signals”, Proc. IWASPAA, pp. 277-280, October 2011, New Paltz,N.Y.; [10] E. G. Williams, “Fourier Acoustics”, Applied MathematicalSciences, vol. 93, 1999, Academic Press; [11] B. Rafaely, “Plane-wavedecomposition of the sound field on a sphere by spherical convolution”,J. Acoust. Soc. Am., 4(116), pages 2149-2157, October 2004.

Additional information is also included in [1] ISO/IEC IS 23008-3,“Information technology—High efficiency coding and media delivery inheterogeneous environments—Part 3: 3D audio”.

SUMMARY OF INVENTION

Loudspeaker setups that are not fixed to one loudspeaker may beaddressed by special up/down-mix or re-rendering processing.

When an original spatial virtual position is altered, timbre andloudness artefacts can occur for encodings of two-channel stereo toHigher Order Ambisonics (denoted HOA) using the speaker positions asplane wave origins.

In the context of spatial audio, while both audio image sharpness andspaciousness may be desirable, the two may have contradictoryrequirements. Sharpness allows an audience to clearly identifydirections of audio sources, while spaciousness enhances a listener'sfeeling of envelopment.

The present disclosure is directed to maintaining both sharpness andspaciousness after converting two-channel stereo channel based contentto 3D audio scene/object based audio content.

A primary ambient decomposition (PAD) may separate directional andambient components found in channel based audio. The directionalcomponent is an audio signal related to a source direction. Thisdirectional component may be manipulated to determine a new directionalcomponent. The new directional component may be encoded to HOA, exceptfor the centre channel direction where the related signal is handled asa static object channel. Additional ambient representations are derivedfrom the ambient components. The additional ambient representations areencoded to HOA.

The encoded HOA directional and ambient components may be combined andan output of the combined HOA representation and the centre channelsignal may be provided.

In one example, this processing may be represented as:

-   -   A) A two-channel stereo signal x(t) is partitioned into        overlapping sample blocks. The partitioned signals are        transformed into the time-frequency domain (T/F) using a        filter-bank, such as, for example by means of an FFT. The        transformation may determine T/F tiles.    -   B) In the T/F domain, direct and ambient signal components are        separated from the two-channel stereo signal x(t) based on:        -   B.1) Estimating ambient power P_(N)({circumflex over (t)},            k), direct power P_(S)({circumflex over (t)}, k), source            directions φ_(S)({circumflex over (t)}, k), and mixing            coefficients a for the directional signal components to be            extracted.        -   B.2) Extracting: (i) two ambient T/F signal channels            n({circumflex over (t)}, k) and (ii) one directional signal            component s({circumflex over (t)}, k) for each T/F tile            related to each estimated source direction φ_(S)({circumflex            over (t)}, k) from B.1.        -   B.3) Manipulating the estimated source directions            φ_(S)({circumflex over (t)}, k) by a stage_width factor            .            -   B.3.a) If the manipulated directions related to the T/F                tile components are within an interval of                ±center_channel_capture_width factor c_(W), they are                combined in order to form a directional centre channel                object signal o_(c)({circumflex over (t)}, k) in the T/F                domain.            -   B.3.b) For directions other than those in B.3.a), the                directional T/F tiles are encoded to HOA using a                spherical harmonic encoding vector y_(S)({circumflex                over (t)}, k) derived from the manipulated source                directions, thus creating a directional HOA signal                b_(s)({circumflex over (t)}, k) in the T/F domain.        -   B.4) Deriving additional ambient signal channels            ({circumflex over (t)}, k) by de-correlating the extracted            ambient channels n({circumflex over (t)}, k), rating these            channels by gain factors g_(L), and encoding all ambient            channels to HOA by creating a spherical harmonics encoding            matrix            from predefined positions, and thus creating an ambient HOA            signal b            ({circumflex over (t)}, k) in the T/F domain.    -   C) Creating a combined HOA signal b({circumflex over (t)}, k) in        T/F domain by combining the directional HOA signals        b_(s)({circumflex over (t)}, k) and the ambient HOA signals b        ({circumflex over (t)}, k).    -   D) Transforming this HOA signal b({circumflex over (t)}, k) and        the centre channel object signals o_(c)({circumflex over        (t)}, k) to time domain by using an inverse filter-bank.    -   E) Storing or transmitting the resulting time domain HOA signal        b(t) and the centre channel object signal o_(c)(t) using an        MPEG-H 3D Audio data rate compression encoder.

A new format may utilize HOA for encoding spatial audio information plusa static object for encoding a centre channel. The new 3D audioscene/object content can be used when pimping up or upmixing legacystereo content to 3D audio. The content may then be transmitted based onany MPEG-H compression and can be used for rendering to any loudspeakersetup.

In principle, the inventive method is adapted for generating 3D audioscene and object based content from two-channel stereo based content,and includes:

-   -   partitioning a two-channel stereo signal into overlapping sample        blocks followed by a transform into time-frequency domain T/F;    -   separating direct and ambient signal components from said        two-channel stereo signal in T/F domain by:        -   estimating ambient power, direct power, source directions            φ_(s)({circumflex over (t)}, k) and mixing coefficients for            directional signal components to be extracted;        -   extracting two ambient T/F signal channels n({circumflex            over (t)}, k) and one directional signal component            s({circumflex over (t)}, k) for each T/F tile related to an            estimated source direction φ_(s)({circumflex over (t)}, k);        -   changing said estimated source directions by a predetermined            factor, wherein, if said changed directions related to the            T/F tile components are within a predetermined interval,            they are combined in order to form a directional centre            channel object signal o_(c)({circumflex over (t)}, k) in T/F            domain,        -   and for the other changed directions outside of said            interval, encoding the directional T/F tiles to Higher Order            Ambisonics HOA using a spherical harmonic encoding vector            derived from said changed source directions, thereby            generating a directional HOA signal b_(s)({circumflex over            (t)}, k) in T/F domain;        -   generating additional ambient signal channels            ({circumflex over (t)}, k) by de-correlating said extracted            ambient channels n({circumflex over (t)}, k) and rating            these channels by gain factors,        -   and encoding all ambient channels to HOA by generating a            spherical harmonics encoding matrix from predefined            positions, thereby generating an ambient HOA signal            ({circumflex over (t)}, k) in T/F domain;    -   generating a combined HOA signal b({circumflex over (t)}, k) in        T/F domain by combining said directional HOA signals        b_(s)({circumflex over (t)}, k) and said ambient HOA signals b        _(({circumflex over (t)}, k);)    -   transforming said combined HOA signal b({circumflex over        (t)}, k) and said centre channel object signals        o_(c)({circumflex over (t)}, k) to time domain.

In principle the inventive apparatus is adapted for generating 3D audioscene and object based content from two-channel stereo based content,said apparatus including means adapted to:

-   -   partition a two-channel stereo signal into overlapping sample        blocks followed by transform into time-frequency domain T/F;    -   separate direct and ambient signal components from said        two-channel stereo signal in T/F domain by:        -   estimating ambient power, direct power, source directions            φ_(s)({circumflex over (t)}, k) and mixing coefficients for            directional signal components to be extracted;        -   extracting two ambient T/F signal channels n({circumflex            over (t)}, k) and one directional signal component            s({circumflex over (t)}, k) for each T/F tile related to an            estimated source direction φ_(s)({circumflex over (t)}, k);        -   changing said estimated source directions by a predetermined            factor, wherein, if said changed directions related to the            T/F tile components are within a predetermined interval,            they are combined in order to form a directional centre            channel object signal o_(c)({circumflex over (t)}, k) in T/F            domain, and for the other changed directions outside of said            interval, encoding the directional T/F tiles to Higher Order            Ambisonics HOA using a spherical harmonic encoding vector            derived from said changed source directions, thereby            generating a directional HOA signal b_(s)({circumflex over            (t)}, k) in T/F domain;        -   generating additional ambient signal channels            ({circumflex over (t)}, k) by de-correlating said extracted            ambient channels n({circumflex over (t)}, k) and rating            these channels by gain factors,        -   and encoding all ambient channels to HOA by generating a            spherical harmonics encoding matrix from predefined            positions, thereby generating an ambient HOA signal            ({circumflex over (t)}, k) in T/F domain;    -   generate (11, 31) a combined HOA signal b({circumflex over        (t)}, k) in T/F domain by combining said directional HOA signals        b_(s)({circumflex over (t)}, k) and said ambient HOA signals        ({circumflex over (t)}, k);    -   transform (11, 31) said combined HOA signal b({circumflex over        (t)}, k) and said centre channel object signals        o_(c)({circumflex over (t)}, k) to time domain.

In principle, the inventive method is adapted for generating 3D audioscene and object based content from two-channel stereo based content,and includes: receiving the two-channel stereo based content representedby a plurality of time/frequency (T/F) tiles; determining, for eachtile, ambient power, direct power, source directions φ_(s)({circumflexover (t)}, k) and mixing coefficients; determining, for each tile, adirectional signal and two ambient T/F channels based on thecorresponding ambient power, direct power, and mixing coefficients;

determining the 3D audio scene and object based content based on thedirectional signal and ambient T/F channels of the T/F tiles. The methodmay further include wherein, for each tile, a new source direction isdetermined based on the source direction φ_(s)({circumflex over (t)},k), and, based on a determination that the new source direction iswithin a predetermined interval, a directional centre channel objectsignal o_(c)({circumflex over (t)}, k) is determined based on thedirectional signal, the directional centre channel object signalo_(c)({circumflex over (t)}, k) corresponding to the object basedcontent, and, based on a determination that the new source direction isoutside the predetermined interval, a directional HOA signalb_(s)({circumflex over (t)}, k) is determined based on the new sourcedirection. Moreover, for each tile, additional ambient signal channels

({circumflex over (t)}, k) may be determined based on a de-correlationof the two ambient T/F channels, and ambient HOA signals

({circumflex over (t)}, k) are determined based on the additionalambient signal channels. The 3d audio scene content is based on thedirectional HOA signals b_(s)({circumflex over (t)}, k) and the ambientHOA signals

({circumflex over (t)}, k).

BRIEF DESCRIPTION OF DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings, which show in:

FIG. 1 An exemplary HOA upconverter;

FIG. 2 Spherical and Cartesian reference coordinate system;

FIG. 3 An exemplary artistic interference HOA upconverter;

FIG. 4 Classical PCA coordinates system (left) and intended coordinatesystem (right) that complies with FIG. 2;

FIG. 5 Comparison of extracted azimuth source directions using thesimplified method and the tangent method;

FIG. 6 shows exemplary curves 6 a, 6 b and 6 c related to alteringpanning directions by naive HOA encoding of two-channel content, for twoloudspeaker channels that are 60° apart.

FIG. 7 illustrates an exemplary method for converting two-channel stereobased content to 3D audio scene and object based content.

FIG. 8 illustrates an exemplary apparatus configured to converttwo-channel stereo based content to 3D audio scene and object basedcontent.

DESCRIPTION OF EMBODIMENTS

Even if not explicitly described, the following embodiments may beemployed in any combination or sub-combination.

FIG. 1 illustrates an exemplary HOA upconverter 11. The HOA upconverter11 may receive a two-channel stereo signal x(t) 10. The two-channelstereo signal 10 is provided to an HOA upconverter 11. The HOAupconverter 11 may further receive an input parameter set vector p_(c)12. The HOA upconverter 11 then determines a HOA signal b(t) 13 having(N+1)² coefficient sequences for encoding spatial audio information anda centre channel object signal o_(c)(t) 14 for encoding a static object.In one example, HOA upconverter 11 may be implemented as part of acomputing device that is adapted to perform the processing carried outby each of said respective units.

FIG. 2 shows a spherical coordinate system, in which the x axis pointsto the frontal position, the y axis points to the left, and the z axispoints to the top. A position in space x=(r,θ,ϕ)^(T) is represented by aradius r>0 (i.e. the distance to the coordinate origin), an inclinationangle θ ∈ [0,π] measured from the polar axis z and an azimuth angle ϕ ∈[0,2π[ measured counter-clockwise in the x-y plane from the x axis.(⋅)^(T) denotes a transposition. The sound pressure is expressed in HOAas a function of these spherical coordinates and spatial frequency

${k = {\frac{\omega}{c} = \frac{2\pi \; f}{c}}},$

wherein c is the speed of sound waves in air.

The following definitions are used in this application (see also FIG.2). Bold lowercase letters indicate a vector and bold uppercase lettersindicate a matrix. For brevity, discrete time and frequency indicest,{circumflex over (t)},k are often omitted if allowed by the context.

TABLE 1 1. x(t) Input two-channel stereo signal, x(t) = xϵ

² [x₁(t), x₂(t)]^(T), where t indicates a sample value related to thesampling frequency fs 2. b(t) Output HOA signal with HOA order N bϵ

^((N+1)2) b(t) = [{dot over (b)}₁(t), . . . , {dot over (b)}_((N+1))²(t)]^(T)= [b₀ ⁰(t), b₁ ⁻¹ . . . , b_(N) ^(N) (t)] 3. o_(c)(t) Outputcentre channel object signal o_(c)ϵ

¹ 4. p_(c) Input parameter vector with control values: stage_width

, center_channel_capture width c_(w), maximum HOA order index N, ambientgains g_(L)ϵ

^(L), direct_sound_encoding_elevation θ_(S) 5. {circumflex over (Ω)} Aspherical position vector according to FIG. 2. {circumflex over (Ω)} =[r, θ, ϕ] with radius r, inclination θ and azimuth ϕ 6. Ω Sphericaldirection vector Ω = [θ, ϕ] 7. φ_(x) Ideal loudspeaker position azimuthangle related to signal x₁, assuming that −φ_(x) is the position relatedto x₂ 8. T/F Domain variables: 9. x({circumflex over (t)}, k) Input andoutput signals in complex T/F xϵ

² b({circumflex over (t)}, k) domain, where {circumflex over (t)}indicates the discrete bϵ

^((N+1)2) o_(c)({circumflex over (t)}, k) temporal index and k thediscrete o_(c)ϵ

¹ frequency index 10. s({circumflex over (t)}, k) Extracted directionalsignal component sϵ

¹ 11. a({circumflex over (t)}, k) Gain vector that mixes the directionalaϵ

² components into x({circumflex over (t)}, k), a =[a₁, a₂]^(T) 12.φ_(s)({circumflex over (t)}, k) Azimuth angle of virtual source φ_(s)ϵ

¹ direction of s({circumflex over (t)}, k) 13. n({circumflex over (t)},k) Extracted ambient signal components, nϵ

² n = [n₁, n₂]^(T) 14. P_(s)({circumflex over (t)}, k) Estimated powerof directional component 15. P_(N)({circumflex over (t)}, k) Estimatedpower of ambient components n₁, n₂ 16. C({circumflex over (t)}, k)Correlation / covariance matrix, Cϵ

^(2×2) C({circumflex over (t)}, k) = E(x({circumflex over (t)}, k)x({circumflex over (t)}, k)^(H)) , with E( ) denoting the expectationoperator 17.

({circumflex over (t)}, k) Ambient component vector consisting of

ϵ

^(L) L ambience channels 18. y_(s)({circumflex over (t)}, k) Sphericalharmonics vector y_(s) y_(s) = [Y₀ ⁰(θ_(s), ϕ_(s)), Y₁ ⁻¹ (θ_(s), ϕ_(s)), . . . , Y_(N) ^(N) (θ_(s), ϕ_(s))]^(T) to encode s to HOA, whereθ_(s), ϕ_(s) is the encoding direction of the directional component,ϕ_(s) =

φ_(s) 19. Y_(m) ^(n) (θ, ϕ ) Spherical Harmonic (SH) of order n andY_(n) ^(m)ϵ

^((N+1)2) degree m. See [1] and section HOA format description fordetails. All considerations are valid for N3D normalised SHs. 20. Ψ

Mode matrix to encode the ambient Ψ_(L)ϵ

^((N+1)2) component vector

to HOA. Ψ

= [y

₁ , . . . ,y

_(L)], y

_(L)= [Y₀ ⁰(θ_(L), ϕ_(L)), Y₁ ⁻¹(θ_(L), ϕ_(L)), . . . , Y_(N)^(N)(θ_(L), ϕ_(L))]^(T) 21. b_(s)({circumflex over (t)}, k) DirectionalHOA component b

({circumflex over (t)}, k) Diffuse HOA component

Initialisation

In one example, an initialisation may include providing to or receivingby a method or a device a channel stereo signal x(t) and controlparameters p_(c) (e.g., the two-channel stereo signal x(t) 10 and theinput parameter set vector p_(c) 12 illustrated in FIG. 1). Theparameter p_(c) may include one or more of the following elements:

-   -   stage_width        element that represents a factor for manipulating source        directions of extracted directional sounds, (e.g., with a        typical value range from 0.5 to 3);    -   center_channel_capture_width c_(W) element that relates to        setting an interval (e.g., in degrees) in which extracted direct        sounds will be re-rendered to a centre channel object        signal;where a negative c_(W) value (e.g. in the range 0 to 10        degrees) will defeat this channel and zero PCM values will be        the output of o_(c)(t); and a positive value of c_(W) will mean        that all direct sounds will be rendered to the centre channel if        their manipulated source direction is in the interval [−_(W),        c_(W)].    -   max HOA order index N element that defines the HOA order of the        output HOA signal b(t) that will have (N+1)² HOA coefficient        channels;    -   ambient gains g_(L) elements that relate to L values are used        for rating the derived ambient signals        ({circumflex over (t)}, k) before HOA encoding; these gains        (e.g. in the range 0 to 2) manipulate image sharpness and        spaciousness;    -   direct_sound_encoding_elevation θ_(S) element (e.g. in the range        −10 to +30 degrees) that sets the virtual height when encoding        direct sources to HOA.

The elements of parameter p_(c) may be updated during operation of asystem, for example by updating a smooth envelope of these elements orparameters.

FIG. 3 illustrates an exemplary artistic interference HOA upconverter31. The HOA upconverter 31 may receive a two-channel stereo signal x(t)34 and an artistic control parameter set vector p_(c) 35. The HOAupconverter 31 may determine an output HOA signal b(t) 36 having (N+1)²coefficient sequences and a centre channel object signal o_(c)(t) 37that are provided to a rendering unit 32, the output signal of which arebeing provided to a monitoring unit 33. In one example, the HOAupconverter 31 may be implemented as part of a computing device that isadapted to perform the processing carried out by each of said respectiveunits.

T/F Analysis Filter Bank

A two channel stereo signal x(t) may be transformed by HOA upconverter11 or 31 into the time/frequency (T/F) domain by a filter bank. In oneembodiment a fast fourier transform (FFT) is used with 50% overlappingblocks of 4096 samples. Smaller frequency resolutions may be utilized,although there may be a trade-off between processing speed andseparation performance. The transformed input signal may be denoted asx({circumflex over (t)}, k) in T/F domain, where {circumflex over (t)}relates to the processed block and k denotes the frequency band or binindex.

T/F Domain Signal Analysis

In one example, for each T/F tile of the input two-channel stereo signalx(t), a correlation matrix may be determined. In one example, thecorrelation matrix may be determined based on:

$\begin{matrix}{{{C\left( {\hat{t},k} \right)} = {{E\left( {{x\left( {\hat{t},k} \right)}{x\left( {\hat{t},k} \right)}^{H}} \right)} = \begin{bmatrix}{c_{11}\left( {\hat{t},k} \right)} & {c_{12}\left( {\hat{t},k} \right)} \\{c_{21}\left( {\hat{t},k} \right)} & {c_{22}\left( {\hat{t},k} \right)}\end{bmatrix}}},} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 1}}\end{matrix}$

wherein E( ) denotes the expectation operator. The expectation can bedetermined based on a mean value over t_(num) temporal T/F values (index{circumflex over (t)}) by using a ring buffer or an IIR smoothingfilter.

The Eigenvalues of the correlation matrix may then be determined, suchas for example based on:

λ₁({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁+√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)})   Equation No. 2a

λ₂({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁−√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)})   Equation No. 2b

Wherein c_(r12)=real(c₁₂) denotes the real part of c₁₂. The indices({circumflex over (t)}, k) may be omitted during certain notations,e.g., as within Equation Nos. 2a and 2b.

For each tile, based on the correlation matrix, the following may bedetermined: ambient power, directional power, elements of a gain vectorthat mixes the directional components, and an azimuth angle of thevirtual source direction s({circumflex over (t)}, k) to be extracted.

In one example, the ambient power may be determined based on the secondeigenvalue, such as for example:

P_(N)({circumflex over (t)}, k): P _(N)({circumflex over (t)},k)=λ₂({circumflex over (t)}, k)   Equation No. 3

In another example, the directional power may be determined based on thefirst eigenvalue and the ambient power, such as for example:

P _(s)({circumflex over (t)}, k): P _(s)({circumflex over (t)},k)=λ₁({circumflex over (t)}, k)−P _(N)({circumflex over (t)}, k)  Equation No. 4

In another example, elements of a gain vector a({circumflex over (t)},k)=[a₁({circumflex over (t)}, k), a₂({circumflex over (t)}, k)]^(T) thatmixes the directional components into x({circumflex over (t)}, k) may bedetermined based on:

$\begin{matrix}{{{a_{1}\left( {\hat{t},k} \right)} = \frac{1}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},{{a_{2}\left( {\hat{t},k} \right)} = \frac{A\left( {\hat{t},k} \right)}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},} & {{{Equation}\mspace{14mu} {{No}.\mspace{11mu} 5}}\;} \\{\mspace{79mu} {{{{with}\mspace{14mu} {A\left( {\hat{t},k} \right)}} = \frac{{\lambda_{1}\left( {\hat{t},k} \right)} - c_{11}}{c_{r\; 12}}};}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 5}a}\end{matrix}$

The azimuth angle of virtual source direction s({circumflex over (t)},k) to be extracted may be determined based on:

$\begin{matrix}{{\phi_{s}\left( {\hat{t},k} \right)} = {\left( {{{atan}\left( \frac{1}{A\left( {\hat{t},k} \right)} \right)} - \frac{\pi}{4}} \right)\frac{\phi_{x}}{\left( {\pi/4} \right)}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 6}}\end{matrix}$

-   -   with φ_(x) giving the loudspeaker position azimuth angle related        to signal x₁ in radian (assuming that −φ_(x) is the position        related to x₂).

Directional and Ambient Signal Extraction

In this sub section for better readability the indices ({circumflex over(t)}, k) are omitted. Processing is performed for each T/F tile({circumflex over (t)}, k).

For each T/F tile, a first directional intermediate signal is extractedbased on a gain, such as, for example:

$\begin{matrix}{\hat{s}:={g^{T}x}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 7}a} \\{{{with}\mspace{14mu} g} = \begin{bmatrix}\frac{a_{1}P_{s}}{P_{s} + P_{N}} \\\frac{a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 7}b}\end{matrix}$

The intermediate signal may be scaled in order to derive the directionalsignal, such as for example, based on:

$\begin{matrix}{s = {\sqrt{\frac{P_{s}}{{\left( {{g_{1}a_{1}} + {g_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {g_{1}^{2} + g_{2}^{2}} \right)P_{N}}}}\hat{s}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 8}}\end{matrix}$

The two elements of an ambient signal n=[n₁,n₂]^(T) are derived by firstcalculating intermediate values based on the ambient power, directionalpower, and the elements of the gain vector:

$\begin{matrix}{{\hat{n}}_{1} = {{h^{T}x\mspace{14mu} {with}\mspace{14mu} h} = \begin{bmatrix}\frac{{a_{2}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}} \\\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 9}a} \\{{\hat{n}}_{2} = {{w^{T}x\mspace{14mu} {with}\mspace{14mu} w} = \begin{bmatrix}\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}} \\\frac{{a_{1}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 9}b}\end{matrix}$

followed by scaling of these values:

$\begin{matrix}{n_{1} = {\sqrt{\frac{P_{N}}{{\left( {{h_{1}a_{1}} + {h_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {h_{1}^{2} + h_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{1}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 10}a} \\{n_{2} = {\sqrt{\frac{P_{N}}{{\left( {{w_{1}a_{1}} + {w_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {w_{1}^{2} + w_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{2}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 10}b}\end{matrix}$

Processing of Directional Components

A new source direction ϕ_(s)({circumflex over (t)}, k) may be determinedbased on a stage_width

and, for example, the azimuth angle of the virtual source direction(e.g., as described in connection with Equation No. 6). The new sourcedirection may be determined based on:

ϕ_(s)({circumflex over (t)}, k)=

φ_(s)({circumflex over (t)}, k)   Equation No. 11

A centre channel object signal o_(c)({circumflex over (t)}, k) and/or adirectional HOA signal b_(s)({circumflex over (t)}, k) in the T/F domainmay be determined based on the new source direction. In particular, thenew source direction ϕ_(s)({circumflex over (t)}, k) may be compared toa center_channel_capture_width c_(W).

If |ϕ_(s)({circumflex over (t)}, k)|<c_(W), then

o _(c)({circumflex over (t)}, k)=s({circumflex over (t)}, k) and b_(s)({circumflex over (t)}, k)=0   Equation No. 12a

else:

o _(c)({circumflex over (t)}, k)=0 and b _(s)({circumflex over (t)},k)=y _(s)({circumflex over (t)}, k)s({circumflex over (t)}, k)  Equation No. 12b

where y_(s)({circumflex over (t)}, k) is the spherical harmonic encodingvector derived from {circumflex over (φ)}_(s)({circumflex over (t)}, k)and a direct_sound_encoding_elevation θ_(S). In one example, they_(s)({circumflex over (t)}, k) vector may be determined based on thefollowing:

y _(s)({circumflex over (t)}, k)=[Y ₀ ⁰(θ_(S), ϕ_(s)), Y ₁ ⁻¹(θ_(S),ϕ_(s)), . . . , Y_(N) ^(N)(η_(S), ϕ_(s))]^(T)   Equation No. 13

Processing of Ambient HOA Signal

The ambient HOA signal

({circumflex over (t)}, k) may be determined based on the additionalambient signal channels

({circumflex over (t)}, k). For example, the ambient HOA signal

({circumflex over (t)}, k) may be determined based on:

({circumflex over (t)}, k)=

diag(g _(L))

({circumflex over (t)}, k)   Equation No. 14

where diag(g_(L)) is a square diagonal matrix with ambient gains g_(L)on its main diagonal,

({circumflex over (t)}, k) is a vector of ambient signals derived from nand

is a mode matrix for encoding

({circumflex over (t)}, k) to HOA. The mode matrix may be determinedbased on:

=[

, . . . ,

],

=[Y ₀ ⁰(θ_(L), ϕ_(L)), Y ₁ ⁻¹(θ_(L), ϕ_(L)), . . . , Y _(N) ^(N)(θ_(L),ϕ_(L))^(T)]   Equation No. 15

Wherein, L denotes the number of components in

({circumflex over (t)}, k).

In one embodiment L=6 is selected with the following positions:

TABLE 2 l (direction number, θ_(l) ϕ_(l) ambient Inclination/ Azimuth/channel number) rad rad 1 π/2   30 π/180 2 π/2 −30 π/180 3 π/2 105 π/1804 π/2 −105 π/180   5 π/2 180 π/180 6 0 0

The vector of ambient signals is determined based on:

$\begin{matrix}{{\overset{...}{n}\left( {\hat{t},k} \right)} = {\begin{bmatrix}1 & 0 \\0 & 1 \\{F_{s}(k)} & 0 \\0 & {F_{s}(k)} \\{F_{B}(k)} & {F_{B}(k)} \\{F_{T}(k)} & {F_{T}(k)}\end{bmatrix}n}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 16}}\end{matrix}$

with weighting (filtering) factors F_(i)(k)ϵ

¹, wherein

$\begin{matrix}{{{F_{i}(k)} = {{a_{i}(k)}e^{{- 2}\pi \; {ik}\frac{d_{i}}{fft}}{size}}},{d_{i}{a_{i}(k)}{\epsilon\mathbb{R}}},} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 17}}\end{matrix}$

-   -   d_(i) is a delay in samples, and a_(i)(k) is a spectral        weighting factor (e.g. in the range 0 to 1).

Synthesis Filter Bank

The combined HOA signal is determined based on the directional HOAsignal b_(s)({circumflex over (t)}, k) and the ambient HOA signal

({circumflex over (t)}, k). For example:

b({circumflex over (t)}, k)=b _(s)({circumflex over (t)}, k)+

({circumflex over (t)}, k)   Equation No. 18

The T/F signals b({circumflex over (t)}, k) and o_(c)({circumflex over(t)}, k) are transformed back to time domain by an inverse filter bankto derive signals b(t) and o_(c)(t). For example, the T/F signals may betransformed based on an inverse fast fourier transform (IFFT) and anoverlap-add procedure using a sine window.

Processing of Upmixed Signals

The signals b(t) and o_(c)(t and related metadata, the maximum HOA orderindex N and the direction

$\Omega_{o_{c}} = \left\lbrack {\frac{\pi}{2},0} \right\rbrack$

of signal o_(c)(t) may be stored or transmitted based on any format,including a standardized format such as an MPEG-H 3D audio compressioncodec. These can then be rendered to individual loudspeaker setups ondemand.

Primary Ambient Decomposition in T/F Domain

In this section the detailed deduction of the PAD algorithm ispresented, including the assumptions about the nature of the signals.Because all considerations take place in T/F domain indices ({circumflexover (t)}, k) are omitted.

Signal Model, Model Assumptions and Covariance Matrix

The following signal model in time frequency domain (T/F) is assumed:

x=a s+n,   Equation No. 19a

x ₁ =a ₁ s+n ₁,   Equation No. 19b

x ₂ =a ₂ s+n ₂,   Equation No. 19c

√{square root over (a ₁ ² +a ₂ ²)}=1   Equation No. 19d

The covariance matrix becomes the correlation matrix if signals withzero mean are assumed, which is a common assumption related to audiosignals:

$\begin{matrix}{C = {{E\left( {xx}^{H} \right)} = \begin{bmatrix}c_{11} & c_{12} \\c_{12}^{*} & c_{22}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 20}}\end{matrix}$

wherein E( ) is the expectation operator which can be approximated byderiving the mean value over T/F tiles.

Next the Eigenvalues of the covariance matrix are derived. They aredefined by

λ_(1,2)(C)={x: det(C−x1)=0}.   Equation No. 21

Applied to the covariance matrix:

$\begin{matrix}{{\det \left( \begin{bmatrix}{c_{11} - x} & c_{12} \\c_{12}^{*} & {c_{22} - x}\end{bmatrix} \right)} = {{{\left( {c_{11} - x} \right)\left( {c_{22} - x} \right)} - {c_{12}}^{2}} = 0}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 22}}\end{matrix}$

with c*₁₂ c₁₂=|c₁₂|².

The solution of λ_(1,2) is:

λ_(1,2)=1/2(c ₂₂ +c ₁₁±√{square root over ((c ₁₁ −c ₂₂)²+4|c ₁₂|²)})  Equation No. 23

The model assumptions and the covariance matrix are given by:

-   -   Direct and noise signals are not correlated E(s n*_(1,2))=0    -   The power estimate is given by P_(s)=E(s s*)    -   The ambient (noise) component power estimates are equal:

P _(N) =P _(n1) =P _(n2) =E(n ₁ n ₁)

-   -   The ambient components are not correlated: E(n₁n*₂)=0

The model covariance becomes

$\begin{matrix}{C = \begin{bmatrix}{{{a_{1}}^{2}P_{s}} + P_{N}} & {a_{1}a_{2}^{*}P_{s}} \\{a_{1}^{*}a_{2}P_{s}} & {{{a_{2}}^{2}P_{s}} + P_{N}}\end{bmatrix}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 24}}\end{matrix}$

In the following real positive-valued mixing coefficients a₁,a₂ and√{square root over (a₁ ²+a₂ ²)}=1 are assumed, and consequentlyc_(r12)=real(c₁₂).

The Eigenvalues become:

$\begin{matrix}{\lambda_{1,2} = {\frac{1}{2}\left( {c_{22} + {c_{11} \pm \sqrt{\left( {c_{11} - c_{22}} \right)^{2} + {4{c_{r\; 12}}^{2}}}}} \right){Equation}\mspace{14mu} {{No}.\mspace{14mu} 25}a}} \\{= {0.5\left( {P_{s} + {{2\; P_{N}} \pm \sqrt{\left( {{P_{s}^{2}\left( {a_{1}^{2} - a_{2}^{2}} \right)}^{2} + {4\; a_{1}^{2}a_{2}^{2}P_{s}}} \right)}}} \right){Equation}\mspace{14mu} {{No}.\mspace{14mu} 25}b}} \\{= {0.5\left( {P_{s} + {{2\; P_{N}} \pm \sqrt{\left( {P_{s}^{2}\left( {a_{1}^{2} + a_{2}^{2}} \right)}^{2} \right)}}} \right){Equation}\mspace{14mu} {{No}.\mspace{14mu} 25}c}} \\{= {0.5\left( {P_{s} + {{2\; P_{N}} \pm P_{s}}} \right){Equation}\mspace{14mu} {{No}.\mspace{14mu} 25}d}}\end{matrix}$

Estimates of ambient power and directional power

The ambient power estimate becomes:

P _(N)=λ₂=1/2(c ₂₂ +c ₁₁−√{square root over ((c ₁₁ −c ₂₂)²+4|c_(r12)|²)})   Equation No. 26

The direct sound power estimate becomes:

P _(s)=λ₁ −P _(N)=√{square root over ((c ₁₁ −c ₂₂)²+4|c_(r12)|²)}  Equation No. 27

Direction of Directional Signal Component

The ratio A of the mixing gains can be derived as:

$\begin{matrix}{A = {\frac{a_{2}}{a_{1}} = {\frac{\lambda_{1} - c_{11}}{c_{r\; 12}} = {\frac{P_{N} + P_{s} - c_{11}}{c_{r\; 12}} = {\frac{c_{22} - P_{N}}{c_{r\; 12}} = \frac{\left( {c_{22} - c_{11} + \sqrt{\left( {c_{11} - c_{22}} \right)^{2} + {4{c_{r\; 12}}^{2}}}} \right)}{2{c_{r\; 12}}}}}}}} & {{Eq}.\mspace{14mu} {No}.\mspace{14mu} 28}\end{matrix}$

With a₁ ²=1−a₂ ², and a₂ ²=1−a₁ ² it follows:

$a_{1} = {{\frac{1}{\sqrt{1 + A^{2}}}\mspace{14mu} {and}\mspace{14mu} a_{2}} = \frac{A}{\sqrt{1 + A^{2}}}}$

The principal component approach includes:

The first and second Eigenvalues are related to Eigenvectors v₁,v₂ whichare given in mathematical literature and in [8] by

$\begin{matrix}{V = {\left\lbrack {v_{1},v_{2}} \right\rbrack = \begin{bmatrix}{\cos \left( \hat{\phi} \right)} & {- {\sin \left( \hat{\phi} \right)}} \\{\sin \left( \hat{\phi} \right)} & {\cos \left( \hat{\phi} \right)}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 29}}\end{matrix}$

Here the signal x₁ would relate to the x-axis and the signal x₂ wouldrelate to the y-axis of a Cartesian coordinate system. This would mapthe two channels to be 90° apart with relations: cos({circumflex over(φ)})=a₁s/s, sin({circumflex over (φ)})=a₂s/s. Thus the ratio of themixing gains can be used to derive {circumflex over (φ)}, with:

$\begin{matrix}{A = {{\frac{a_{2}}{a_{1}}\text{:}\mspace{14mu} \hat{\phi}} = {{atan}(A)}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 30}}\end{matrix}$

The preferred azimuth measure φ would refer to an azimuth of zero placedhalf angle between related virtual speaker channels, positive angledirection in mathematical sense counter clock wise. To translate fromthe above-mentioned system:

$\begin{matrix}{\phi = {{{- \hat{\phi}} + \frac{\pi}{4}} = {{{- {{atan}(A)}} + \frac{\pi}{4}} = {{{atan}\left( {1/A} \right)} - {\pi/4}}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 31}}\end{matrix}$

The tangent law of energy panning is defined as

$\begin{matrix}{\frac{\tan (\phi)}{\tan \left( \phi_{o} \right)} = \frac{a_{1} - a_{2}}{a_{1} + a_{2}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 32}}\end{matrix}$

where φ₀ is the half loudspeaker spacing angle. In the model used here,

${\phi_{o} = \frac{\pi}{4}},{{\tan \left( \phi_{o} \right)} = 1.}$

It can be shown that

$\begin{matrix}{\phi = {{atan}\left( \frac{a_{1} - a_{2}}{a_{1} + a_{2}} \right)}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 33}}\end{matrix}$

Based on FIG. 2, FIG. 4a illustrates a classical PCA coordinates system.FIG. 4b illustrates an intended coordinate system.

Mapping the angle φ to a real loudspeaker spacing includes: Otherspeaker φ_(x) spacings than the

$90{^\circ}\mspace{14mu} \left( {\phi_{o} = \frac{\pi}{4}} \right)$

addressed in the model can be addressed based on either:

$\begin{matrix}{\phi_{s} = {\phi \frac{\phi_{x}}{\phi_{o}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 34}a}\end{matrix}$

or more accurate

$\begin{matrix}{{\overset{.}{\phi}}_{s} = {{atan}\left( {{\tan \left( \phi_{x} \right)}\frac{a_{1} - a_{2}}{a_{1} + a_{2}}} \right)}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 34}b}\end{matrix}$

FIG. 5 illustrates two curves, a and b, that relate to a differencebetween both methods for a 60° loudspeaker spacing

$\left( {\phi_{x} = {30{^\circ}\frac{\pi}{180{^\circ}}}} \right).$

To encode the directional signal to HOA with limited order, the accuracyof the first method

$\left( {\phi_{s} = {\phi \frac{\phi_{x}}{\phi_{o}}}} \right)$

is regarded as being sufficient.

Directional and Ambient Signal Extraction

Directional Signal Extraction

The directional signal is extracted as a linear combination with gainsg^(T)=[g₁, g₂] of the input signals:

ŝ:=g ^(T) x=g ^(T)(a s+n)   Equation No. 35a

The error signal is

err=s−g ^(T)(a s+n)   Equation No. 35b

and becomes minimal if fully orthogonal to the input signals x with ŝ=s:

E(x err*)=0   Equation No. 36

a P _(ŝ) −a g ^(T) a P _(ŝ) +gP _(n)=0   Equation No. 37

taking in mind the model assumptions that the ambient components are notcorrelated:

(E(n ₁n*₂)=0)   Equation No. 38

Because the order of calculation of a vector product of the form g^(T) ais interchangeable, g^(T) a=a g^(T):

(aa ^(T) P _(ŝ) +I P _(N))g=aP _(ŝ)  Equation No. 39

The term in brackets is a quadratic matrix and a solution exists if thismatrix is invertible, and by first setting P_(ŝ)=P_(s) the mixing gainsbecome:

$\begin{matrix}{g = {\left( {{{aa}^{T}P_{\hat{s}}} + {IP}_{N}} \right)^{- 1}{aP}_{\hat{s}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 40}a} \\{\left( {{{aa}^{T}P_{\hat{s}}} + {IP}_{N}} \right) = \begin{bmatrix}{{a_{1}^{2}P_{\hat{s}}} + P_{N}} & {a_{1}a_{2}P_{\hat{s}}} \\{a_{1}a_{2}P_{\hat{s}}} & {a_{2}^{2}P_{\hat{s} + P_{N}}}\end{bmatrix}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 40}b}\end{matrix}$

Solving this system leads to:

$\begin{matrix}{g = \begin{bmatrix}\frac{a_{1}P_{s}}{P_{s} + P_{N}} \\\frac{a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 41}}\end{matrix}$

Post-scaling:

The solution is scaled such that the power of the estimate ŝ becomesP_(s), with

$\begin{matrix}{\mspace{79mu} {P_{\hat{s}} = {{E\left( {\hat{s}{\hat{s}}^{*}} \right)} = {{g^{T}\left( {{{aa}^{T}P_{s}} + {IP}_{N}} \right)}g}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 42}a} \\{s = {{\sqrt{\frac{P_{s}}{{g^{T}\left( {{{aa}^{T}P_{s}} + {IP}_{N}} \right)}g}}\hat{s}} = {\sqrt{\frac{P_{s}}{{\left( {{g_{1}a_{1}} + {g_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {g_{1}^{2} + g_{2}^{2}} \right)P_{N}}}}\hat{s}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 42}b}\end{matrix}$

Extraction of Ambient Signals

The unscaled first ambient signal can be derived by subtracting theunscaled directional signal component from the first input channelsignal:

{circumflex over (n)} ₁ =x ₁ −a ₁ ŝ=x ₁ −a ₁ g ^(T) x:=h ^(T) x  Equation No. 43

Solving this for {circumflex over (n)}₁=h^(T) x leads to

$\begin{matrix}{h = {{\begin{bmatrix}1 \\0\end{bmatrix} - {a_{1}g}} = \begin{bmatrix}\frac{{a_{2}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}} \\\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 44}}\end{matrix}$

The solution is scaled such that the power of the estimate {circumflexover (n)}₁ becomes P_(N), with

$\begin{matrix}{P_{{\hat{n}}_{1}} = {{E\left( {{\hat{n}}_{1}{\hat{n}}_{1}^{*}} \right)} = {{h^{T}{E\left( {xx}^{H} \right)}h} = {{h^{T}\left( {{{aa}^{T}P_{s}} + {IP}_{N}} \right)}h\text{:}}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 45}a} \\{\mspace{79mu} {n_{1} = {\sqrt{\frac{P_{N}}{{\left( {{h_{1}a_{1}} + {h_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {h_{1}^{2} + h_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{1}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 45}b}\end{matrix}$

The unscaled second ambient signal can be derived by subtracting therated directional signal component from the second input channel signal

{circumflex over (n)} ₂ =x ₂ −a ₂ ŝ=x ₂ −a ₂ g ^(T) x:=w ^(T) x  Equation No. 46

Solving this for {circumflex over (n)}₂=w^(T) x leads to

$\begin{matrix}{w = {{\begin{bmatrix}0 \\1\end{bmatrix} - {a_{2}g}} = \begin{bmatrix}\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{n}} \\\frac{{a_{1}^{2}P_{s}} + P_{n}}{P_{s} + P_{n}}\end{bmatrix}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 47}}\end{matrix}$

The solution is scaled such that the power P_({circumflex over (n)}) ofthe estimate {circumflex over (n)}₂ becomes P_(N), with

$\begin{matrix}{P_{{\hat{n}}_{2}} = {{E\left( {{\hat{n}}_{2}{\hat{n}}_{2}^{*}} \right)} = {{w^{T}{E\left( {xx}^{H} \right)}w} = {{w^{T}\left( {{{aa}^{T}P_{s}} + {IP}_{N}} \right)}w}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 48}a} \\{\mspace{79mu} {n_{2} = {\sqrt{\frac{P_{N}}{{\left( {{w_{1}a_{1}} + {w_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {w_{1}^{2} + w_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{2}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 48}b}\end{matrix}$

Encoding Channel Based Audio to HOA

Naive Approach

Using the covariance matrix, the channel power estimate of x can beexpressed by:

P _(x) =tr(C)=tr(E(xx ^(H)))=E(tr(xx ^(H)))=E(tr(x ^(H) x))=E(x ^(H) x)  Eq No. 49

with E( ) representing the expectation and tr( ) representing the traceoperators.

When returning to the signal model from section Primary ambientdecomposition in T/F domain and the related model assumptions in T/Fdomain:

x=a s+n,   Equation No. 50a

x ₁ =a ₁ s+n ₁,   Equation No. 50b

x ₂ =a ₂ s+n ₂,   Equation No. 50c

√{square root over (a ₁ ² +a ₂ ²)}=1,   Equation No. 50d

the channel power estimate of x can be expressed by:

P _(x) =E(x ^(H) x)=P _(s)+2P _(N)   Equation No. 51

The value of P_(x) may be proportional to the perceived signal loudness.A perfect remix of x should preserve loudness and lead to the sameestimate.

During HOA encoding, e.g., by a mode-matrix Y(Ω_(x)), the sphericalharmonics values may be determined from directions Ω_(x) of the virtualspeaker positions:

b _(x1) =Y(Ω_(x))x   Equation No. 52

HOA rendering with rendering matrix D with near energy preservingfeatures (e.g., see section 12.4.3 of Reference [1]) may be determinedbased on:

$\begin{matrix}{{D^{H}D} \approx \frac{I}{\left( {N + 1} \right)^{2}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 53}}\end{matrix}$

where I is the unity matrix and (N+1)² is a scaling factor depending onHOA order N:

{hacek over (x)}=D Y(Ω_(x))x   Equation No. 54

The signal power estimate of the rendered encoded HOA signal becomes:

$\begin{matrix}{\mspace{79mu} {P_{\overset{\Cup}{x}} = {E\left( {x^{H}{Y\left( \Omega_{x} \right)}^{H}D^{H}{{DY}\left( \Omega_{x} \right)}x} \right)}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 55}a} \\{{\approx {E\left( {\frac{1}{\left( {N + 1} \right)^{2}}x^{H}{Y\left( \Omega_{x} \right)}^{H}{Y\left( \Omega_{x} \right)}x} \right)}} = {{tr}\left( {{{CY}\left( \Omega_{x} \right)}^{H}{Y\left( \Omega_{x} \right)}\frac{1}{\left( {N + 1} \right)^{2}}} \right)}} & {{{Eq}.\mspace{14mu} {No}.\mspace{14mu} 55}b}\end{matrix}$

The following may be determined then:

P_({hacek over (x)})≈P_(x),   Equation No. 55c

This may lead to:

Y(Ω_(x))^(H) Y(Ω_(x)):=(N+1)² I,   Equation No. 56

-   -   which usually cannot be fulfilled for mode matrices related to        arbitrary positions. The consequences of Y(Ω_(x))^(H)Y(Ω_(x))        not becoming diagonal are timbre colorations and loudness        fluctuations. Y(Ω_(id)) becomes a un-normalised unitary matrix        only for special positions (directions) Ω_(id) where the number        of positions (directions) is equal or bigger than (N+1)² and at        the same time where the angular distance to next neighbour        positions is constant for every position (i.e. a regular        sampling on a sphere).

Regarding the impact of maintaining the intended signal directions whenencoding channels based content to HOA and decoding:

Let x=a s, where the ambient parts are zero. Encoding to HOA andrendering leads to {circumflex over (x)}=D Y(Ω_(x))a s.

Only rendering matrices satisfying D Y(Ω_(x))=I would lead to the samespatial impression as replaying the original. Generally, D=Y(ω_(x))⁻¹does not exist and using the pseudo inverse will in general not lead toD Y(Ω_(x))=I.

Generally, when receiving HOA content, the encoding matrix is unknownand rendering matrices D should be independent from the content.

FIG. 6 shows exemplary curves related to altering panning directions bynaive HOA encoding of two-channel content, for two loudspeaker channelsthat are 60° apart. FIG. 6 illustrates panning gains gn_(l) and gn_(r)of a signal moving from right to left and energy sum

sumEn=√{square root over (gn_(l) ² +gn _(r) ²)}  Equation No. 57

The top part shows VBAP or tangent law amplitude panning gains. The midand bottom parts show naive HOA encoding and 2-channel rendering of aVBAP panned signal, for N=2 in the mid and for N=6 at the bottom.Perceptually the signal gets louder when the signal source is at midposition, and all directions except the extreme side positions will bewarped towards the mid position. Section 6a of FIG. 6 relates to VBAP ortangent law amplitude panning gains. Section 6b of FIG. 6 relates to anaive HOA encoding and 2-channel rendering of VBAP panned signal forN=2. Section 6c relates to naive HOA encoding and 2-channel rendering ofVBAP panned signal for N=6.

PAD Approach

Encoding the Signal

x=a s+n   Equation No. 58a

after performing PAD and HOA upconversion leads to

b _(x2) =y _(s) s+

{circumflex over (n)},   Equation No. 58b

with

{circumflex over (n)}=diag(g _(L))

  Equation No. 58c

The power estimate of the rendered HOA signal becomes:

$\begin{matrix}{P_{\overset{\sim}{x}} = {{{E\left( {b_{x\; 2}^{H}D^{H}{Db}_{x\; 2}} \right)} \approx {E\left( {\frac{1}{\left( {N + 1} \right)^{2}}b_{x\; 2}^{H}b_{x\; 2}} \right)}} = {E\left( {\frac{1}{\left( {N + 1} \right)^{2}}\left( {{s^{*}y_{s}^{H}y_{s}s} + {{\hat{n}}^{H}\Psi_{\overset{...}{n}}^{H}\Psi_{\overset{...}{n}}\hat{n}}} \right)} \right)}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 59}}\end{matrix}$

For N3D normalised SH:

y _(s) ^(H) y _(s)=(N+1)²   Equation No. 60

and, taking into account that all signals of {circumflex over (n)} areuncorrelated, the same applies to the noise part:

P _({tilde over (x)}) ≈P _(s)+Σ_(l=1) ^(L) P _(n) _(l) =P _(s) +P _(N)Σ_(l=1) ^(L) g _(l) ²,   Equation No. 61

and ambient gains g_(L)=[1,1,0,0,0,0] can be used for scaling theambient signal power

Σ_(l=1) ^(L) P _(n) _(l) =2P _(N)   Equation No. 62a

and

P_({tilde over (x)})=P_(x).   Equation No. 62b

The intended directionality of s now is given by Dy_(s) which leads to aclassical HOA panning vector which for stage_width

_(W)=1 captures the intended directivity.

HOA Format

Higher Order Ambisonics (HOA) is based on the description of a soundfield within a compact area of interest, which is assumed to be free ofsound sources, see [1]. In that case the spatio-temporal behaviour ofthe sound pressure p(t,x) at time t and position {circumflex over (Ω)}within the area of interest is physically fully determined by thehomogeneous wave equation. Assumed is a spherical coordinate system ofFIG. 2. In the used coordinate system the x axis points to the frontalposition, the y axis points to the left, and the z axis points to thetop. A position in space {circumflex over (Ω)}=(r,θ,ϕ)^(T) isrepresented by a radius r>0 (i.e. the distance to the coordinateorigin), an inclination angle θ ∈ [0,π] measured from the polar axis zand an azimuth angle ϕ E [0,2π[ measured counter-clockwise in the x-yplane from the x axis. Further, (⋅)^(T) denotes the transposition.

A Fourier transform (e.g., see Reference [10]) of the sound pressurewith respect to time denoted by

_(t)(⋅), i.e.

P(ω, {circumflex over (Ω)})=

_(t)(p(t, {circumflex over (Ω)}))=∫_(−∞) ^(∞) p(t, {circumflex over(Ω)})e ^(−iωt) dt,   Equation No. 63

with ω denoting the angular frequency and i indicating the imaginaryunit, can be expanded into a series of Spherical Harmonics according to

P(ω=k c _(s) , r, Θ, ϕ)=Σ_(n=0) ^(N) Σ_(m=−n) ^(n) A _(n) ^(m)(k)j_(n)(kr)Y _(n) ^(m)(θ, ϕ)   Equation No. 64

Here c_(s) denotes the speed of sound and k denotes the angular wavenumber, which is related to the angular frequency ω by

$k = {\frac{\omega}{c_{s}}.}$

Further, j_(n)(⋅) denote the spherical Bessel functions of the firstkind and Y_(n) ^(m)(θ, ϕ) denote the real valued Spherical Harmonics oforder n and degree m, which are defined below. The expansioncoefficients A_(n) ^(m)(k) only depend on the angular wave number k. Ithas been implicitly assumed that sound pressure is spatiallyband-limited. Thus, the series is truncated with respect to the orderindex n at an upper limit N, which is called the order of the HOArepresentation.

If the sound field is represented by a superposition of an infinitenumber of harmonic plane waves of different angular frequencies ω andarriving from all possible directions specified by the angle tuple (θ,ϕ), the respective plane wave complex amplitude function B(ω, θ, ϕ) canbe expressed by the following Spherical Harmonics expansion

B(ω=kc _(s), θ, ϕ)=Σ_(n=0) ^(N) Σ_(m=−n) ^(n) B _(n) ^(m)(k)Y _(n)^(m)(θ, ϕ)   Equation No. 65

-   -   where the expansion coefficients B_(n) ^(m)(k) are related to        the expansion coefficients A_(n) ^(m)(k) by

A _(n) ^(m)(k)=i ^(n) B _(n) ^(m)(k)   Equation No. 66

Assuming the individual coefficients B_(n) ^(m)(ω=kc_(s)) to befunctions of the angular frequency ω, the application of the inverseFourier transform (denoted by ℑ⁻¹(⋅)) provides time domain functions

$\begin{matrix}{{b_{n}^{m}(t)} = {{\mathcal{F}_{t}^{- 1}\left( {B_{n}^{m}\left( {\omega/c_{s}} \right)} \right)} = {\frac{1}{2\pi}{\int_{- \infty}^{\infty}{{B_{n}^{m}\left( \frac{\omega}{c_{s}} \right)}e^{i\; \omega \; t}d\; \omega}}}}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 67}}\end{matrix}$

for each order n and degree m, which can be collected in a single vectorb(t) by

b(t)=[b ₀ ⁰(t)b ₁ ⁻¹(t)b ₁ ⁰(t)b ₁ ¹(t)b ₂ ⁻²(t)b ₂ ⁻¹(t)b ₂ ⁰(t)b ₂¹(t)b ₂ ²(t) . . . b _(N) ^(N−1)(t)b _(N) ^(N)(t)]^(T).   Equation No.68

The position index of a time domain function b_(n) ^(m)(t) within thevector b(t) is given by n(n+1)+1+m. The overall number of elements inthe vector b(t) is given by 0=(N+1)².

The final Ambisonics format provides the sampled version b(t) using asampling frequency f_(S) as

={b(T _(S)), b(2T _(S)), b(3T _(S)), b(4T _(S)), . . . },   Equation No.69

where T_(S)=1/f_(S) denotes the sampling period. The elements ofb(lT_(S)) are here referred to as Ambisonics coefficients. The timedomain signals b_(n) ^(m)(t) and hence the Ambisonics coefficients arereal-valued.

Definition of Real-Valued Spherical Harmonics

The real-valued spherical harmonics Y_(n) ^(m)(θ, ϕ) (assuming N3Dnormalisation) are given by

$\begin{matrix}{{{Y_{n}^{m}\left( {\theta,\varphi} \right)} = {\sqrt{\left( {{2\; n} + 1} \right)\frac{\left( {n - {m}} \right)!}{\left( {n + {m}} \right)!}}{P_{n,{m}}\left( {\cos \; \theta} \right)}{{trg}_{m}(\varphi)}}}\mspace{20mu} {with}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 70}a} \\{\mspace{79mu} {{{trg}_{m}(\varphi)} = \left\{ \begin{matrix}{\sqrt{2}{\cos \left( {m\; \varphi} \right)}} & {m > 0} \\1 & {m = 0} \\{{- \sqrt{2}}{\sin \left( {m\; \varphi} \right)}} & {m < 0}\end{matrix} \right.}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 70}b}\end{matrix}$

The associated Legendre functions P_(n,m)(x) are defined as

$\begin{matrix}{{{P_{n,m}(x)} = {\left( {1 - x^{2}} \right)^{m/2}\frac{d^{m}}{{dx}^{m}}{P_{n}(x)}}},{m \geq 0}} & {{Equation}\mspace{14mu} {{No}.\mspace{14mu} 70}c}\end{matrix}$

with the Legendre polynomial P_(n)(x) and without the Condon-Shortleyphase term (−1)^(m).

Definition of the Mode Matrix

The mode matrix Ψ^((N) ¹ ^(,N) ² ⁾ of order N₁ with respect to thedirections

Ω_(q) ^((N) ² ⁾ , q=1, . . . O ₂=(N ₂+1)² (cf. [11])   Equation No. 71

related to order N₂ is defined by

Ψ^((N) ¹ ^(,N) ² ⁾ :=[y ₁ ^((N) ¹ ⁾ y ₂ ^((N) ¹ ⁾ . . . y _(O) ₂ ^((N) ¹⁾] ∈

^(O) ¹ ^(×O) ²   Equation No. 72

with y_(q) ^((N) ¹ ⁾ : =[Y ₀ ⁰(Ω_(q) ^((N) ² ⁾)Y ⁻¹ ⁻¹(Ω_(q) ^((N) ² ⁾)Y⁻¹ ⁰(Ω_(q) ^((N) ² ⁾)Y ⁻¹ ¹(Ω_(q) ^((N) ² ⁾)Y ⁻² ⁻²(Ω_(q) ^((N) ² ⁾)Y ⁻¹⁻²(Ω_(q) ^((N) ² ⁾) . . . Y _(N) ₁ ^(N) ¹ (Ω_(q) ^((N) ² ⁾)]^(T) ∈

^(O) ¹   Equation No. 73

denoting the mode vector of order N₁ with respect to the directionsΩ_(q) ^((N) ² ⁾, where O₁=(N₁+1)².

A digital audio signal generated as described above can be related to avideo signal, with subsequent rendering.

FIG. 7 illustrates an exemplary method for determining 3D audio sceneand object based content from two-channel stereo based content. At 710,two-channel stereo based content may be received. The content may beconverted into the T/F domain. For example, at 710, a two-channel stereosignal x(t) may be partitioned into overlapping sample blocks. Thepartitioned signals are transformed into the time-frequency domain (T/F)using a filter-bank, such as, for example by means of an FFT. Thetransformation may determine T/F tiles.

At 720, direct and ambient components are determined. For example, thedirect and ambient components may be determined in the T/F domain. At730, audio scene (e.g., HOA) and object based audio (e.g., a centrechannel direction handled as a static object channel) may be determined.The processing at 720 and 730 may be performed in accordance with theprinciples described in connection with A-E and Equation Nos. 1-72.

FIG. 8 illustrates a computing device 800 that may implement the methodof FIG. 7. The computing device 800 may include components 830, 840 and850 that are each, respectively, configured to perform the functions of710, 720 and 730. It is further understood that the respective units maybe embodied by a processor 810 of a computing device that is adapted toperform the processing carried out by each of said respective units,i.e. that is adapted to carry out some or all of the aforementionedsteps, as well as any further steps of the proposed encoding method. Thecomputing device may further comprise a memory 820 that is accessible bythe processor 810.

It should be noted that the description and drawings merely illustratethe principles of the proposed methods and apparatus. It will thus beappreciated that those skilled in the art will be able to devise variousarrangements that, although not explicitly described or shown herein,embody the principles of the invention and are included within itsspirit and scope. Furthermore, all examples recited herein areprincipally intended expressly to be only for pedagogical purposes toaid the reader in understanding the principles of the proposed methodsand apparatus and the concepts contributed by the inventors tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions. Moreover, allstatements herein reciting principles, aspects, and embodiments of theinvention, as well as specific examples thereof, are intended toencompass equivalents thereof.

The methods and apparatus described in the present document may beimplemented as software, firmware and/or hardware. Certain componentsmay e.g. be implemented as software running on a digital signalprocessor or microprocessor. Other components may e.g. be implemented ashardware and or as application specific integrated circuits. The signalsencountered in the described methods and apparatus may be stored onmedia such as random access memory or optical storage media. They may betransferred via networks, such as radio networks, satellite networks,wireless networks or wireline networks, e.g. the Internet.

The described processing can be carried out by a single processor orelectronic circuit, or by several processors or electronic circuitsoperating in parallel and/or operating on different parts of thecomplete processing.

The instructions for operating the processor or the processors accordingto the described processing can be stored in one or more memories. Theat least one processor is configured to carry out these instructions.

1. A method for determining 3D audio scene and object based content fromtwo-channel stereo based content represented by a plurality oftime/frequency (T/F) tiles, comprising: determining, for each T/F tile,ambient power, direct power, source directions and mixing coefficients;determining, for each tile, a directional signal and two ambient T/Fchannels based on the corresponding ambient power, direct power, andmixing coefficients; and determining the 3D audio scene and object basedcontent based on the directional signal and ambient T/F channels of theT/F tiles.
 2. Apparatus for generating 3D audio scene and object basedcontent from two-channel stereo based content, said apparatuscomprising: a processor configured to receive the two-channel stereobased content represented by a plurality of time/frequency (T/F) tiles;wherein the processor is further configured to determine, for each tile,ambient power, direct power, a source direction and mixing coefficients;wherein the processor is configured to determine, for each tile, adirectional signal and two ambient T/F channels based on thecorresponding ambient power, direct power, and mixing coefficients; andwherein the processor is further configured to determine the 3D audioscene and object based content based on the directional signal andambient T/F channels of the T/F tiles.
 3. The method of claim 1,wherein, for each tile, a new source direction is determined based onthe source direction φ_(s)({circumflex over (t)}, k), and, based on adetermination that the new source direction is within a predeterminedinterval, a directional centre channel object signal o_(c)({circumflexover (t)}, k) is determined based on the directional signal, thedirectional centre channel object signal o_(c)({circumflex over (t)}, k)corresponding to the object based content, and, based on a determinationthat the new source direction is outside the predetermined interval, adirectional HOA signal b_(s)({circumflex over (t)}, k) is determinedbased on the new source direction.
 4. The method of claim 1, wherein,for each tile, additional ambient signal channels

({circumflex over (t)}, k) are determined based on a de-correlation ofthe two ambient T/F channels, and ambient HOA signals

({circumflex over (t)}, k) are determined based on the additionalambient signal channels.
 5. The method of claim 3, wherein, the 3d audioscene content is based on the directional HOA signals b_(s)({circumflexover (t)}, k) and the ambient HOA signals b

_(({circumflex over (t)}, k).)
 6. A method according to claim 1, whereinthe two-channel stereo signal x(t) is partitioned into overlappingsample blocks and the sample blocks are transformed into T/F tiles basedon a filter-bank or an FFT.
 7. A method according to the method of claim1, wherein said transformation into time domain is carried out using afilter-bank or an IFFT.
 8. A method according to the method of claim 1,wherein the 3D audio scene and object based content are based on anMPEG-H 3D Audio data standard.
 9. A method according to the method ofclaim 1 further including: calculating for each tile in T/F domain acorrelation matrix${{C\left( {\hat{t},k} \right)} = {{E\left( {{x\left( {\hat{t},k} \right)}{x\left( {\hat{t},k} \right)}^{H}} \right)} = \begin{bmatrix}{c_{11}\left( {\hat{t},k} \right)} & {c_{12}\left( {\hat{t},k} \right)} \\{c_{21}\left( {\hat{t},k} \right)} & {c_{22}\left( {\hat{t},k} \right)}\end{bmatrix}}},$ with E( ) denoting an expectation operator;calculating the Eigenvalues of C({circumflex over (t)}, k) by:λ₁({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁+√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)})λ₂({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁−√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)}), with c_(r12)=real(c₁₂) denoting the real part ofc₁₂; calculating from C({circumflex over (t)}, k) estimationsP_(N)({circumflex over (t)}, k) of ambient power P_(N)({circumflex over(t)}, k)=λ₂({circumflex over (t)}, k), estimations P_(s)({circumflexover (t)}, k) of directional power P_(s)({circumflex over (t)},k)=λ₁({circumflex over (t)}, k)−P_(N)({circumflex over (t)}, k),elements of a gain vector a({circumflex over (t)}, k)=[a₁({circumflexover (t)}, k), a₂({circumflex over (t)}, k)]^(T) that mixes thedirectional components into x({circumflex over (t)}, k) and which aredetermined by:${{a_{1}\left( {\hat{t},k} \right)} = \frac{1}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},{{a_{2}\left( {\hat{t},k} \right)} = \frac{A\left( {\hat{t},k} \right)}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},{{{{with}\mspace{14mu} {A\left( {\hat{t},k} \right)}} = \frac{{\lambda_{s}\left( {\hat{t},k} \right)} - c_{11}}{c_{r\; 12}}};}$calculating an azimuth angle of virtual source direction s({circumflexover (t)}, k) to be extracted by${{\phi_{s}\left( {\hat{t},k} \right)} = {\left( {{{atan}\left( \frac{1}{A\left( {\hat{t},k} \right)} \right)} - \frac{\pi}{4}} \right)\frac{\phi_{x}}{\left( {\pi/4} \right)}}},$with φ_(x) giving the loudspeaker position azimuth angle related tosignal x₁ in radian, thereby assuming that −φ_(x) is the positionrelated to x₂; for each T/F tile ({circumflex over (t)}, k), extractinga first directional intermediate signal by ŝ:=g^(T)x with${g = \begin{bmatrix}\frac{a_{1}P_{s}}{P_{s} + P_{N}} \\\frac{a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}};$ scaling said first directional intermediate signal inorder to derive a corresponding directional signal${s = {\sqrt{\frac{P_{s}}{{\left( {{g_{1}a_{1}} + {g_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {g_{1}^{2} + g_{2}^{2}} \right)P_{N}}}}\hat{s}}};$deriving the elements of the ambient signal n=[n₁, n₂]^(T) by firstcalculating intermediate values${\hat{n}}_{1} = {{h^{T}x\mspace{14mu} {with}\mspace{14mu} h} = {\begin{bmatrix}\frac{{a_{2}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}} \\\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}\mspace{14mu} {and}}}$${{\hat{n}}_{2} = {{w^{T}x\mspace{14mu} {with}\mspace{14mu} w} = \begin{bmatrix}\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}} \\\frac{{a_{1}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}}\end{bmatrix}}},$ followed by scaling of these values:${n_{1} = {\sqrt{\frac{P_{N}}{{\left( {{h_{1}a_{1}} + {h_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {h_{1}^{2} + h_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{1}}},{{n_{2} = {\sqrt{\frac{P_{N}}{{\left( {{w_{1}a_{1}} + {w_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {w_{1}^{2} + w_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{2}}};}$calculating for said directional components a new source directionϕ_(s)({circumflex over (t)}, k) by ϕ_(s)({circumflex over (t)}, k)=

φ_(s)({circumflex over (t)}, k), with stage_width

; if |ϕ_(s)({circumflex over (t)}, k)| is smaller than acenter_channel_capture_width value, setting o_(c)({circumflex over (t)},k)=s({circumflex over (t)}, k) and b_(s)({circumflex over (t)}, k)=0,else setting or o_(c)({circumflex over (t)}, k)=0 and b_(s)({circumflexover (t)}, k)s({circumflex over (t)}, k), whereby y_(s)({circumflex over(t)}, k) is a spherical harmonic encoding vector derived from{circumflex over (φ)}_(s)({circumflex over (t)},k) and adirect_sound_encoding_elevation θ_(S), y_(s)({circumflex over (t)},k)=[Y₀ ⁰(θ_(S), ϕ_(s)), Y₁ ⁻¹(θ_(S), ϕ_(s)), . . . , Y_(N) ^(N)(θ_(S),ϕ_(s))]^(T).
 10. The apparatus of claim 2, wherein the processor isfurther configured to, for each tile, determine a new source directionis determined based on the source direction φ_(s)({circumflex over (t)},k), and, based on a determination that the new source direction iswithin a predetermined interval, a directional centre channel objectsignal o_(c)({circumflex over (t)}, k) is determined based on thedirectional signal, the directional centre channel object signalo_(c)({circumflex over (t)}, k) corresponding to the object basedcontent; wherein the processor is further configured to determine, basedon a determination that the new source direction is outside thepredetermined interval, a directional HOA signal b_(s)({circumflex over(t)}, k) is determined based on the new source direction.
 11. Theapparatus of claim 2, wherein the processor is configured to determine,for each tile, additional ambient signal channels) based on ade-correlation of the two ambient T/F channels, and ambient HOA signals

({circumflex over (t)}, k) are determined based on the additionalambient signal channels.
 12. The apparatus of claim 2, wherein, the 3daudio scene content is based on the directional HOA signalsb_(s)({circumflex over (t)}, k) and the ambient HOA signals b

_(({circumflex over (t)}, k).)
 13. The apparatus of claim 2, wherein thetwo-channel stereo signal x(t) is partitioned into overlapping sampleblocks and the sample blocks are transformed into T/F tiles based on afilter-bank or an FFT.
 14. The apparatus of claim 2, wherein saidtransformation into time domain is carried out using a filter-bank or anIFFT.
 15. The apparatus of claim 2, wherein the 3D audio scene andobject based content are based on an MPEG-H 3D Audio data standard. 16.The apparatus of claim 2, wherein the 3D audio scene and object basedcontent are based on an MPEG-H 3D Audio data standard.
 17. The apparatusof claim 2, wherein the processor is further configured to: calculatefor each tile in T/F domain a correlation matrix${{C\left( {\hat{t},k} \right)} = {{E\left( {{x\left( {\hat{t},k} \right)}{x\left( {\hat{t},k} \right)}^{H}} \right)} = \begin{bmatrix}{c_{11}\left( {\hat{t},k} \right)} & {c_{12}\left( {\hat{t},k} \right)} \\{c_{21}\left( {\hat{t},k} \right)} & {c_{22}\left( {\hat{t},k} \right)}\end{bmatrix}}},$ with E( ) denoting an expectation operator; calculatethe Eigenvalues of C({circumflex over (t)}, k) by:λ₁({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁+√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)})λ₂({circumflex over (t)}, k)=1/2(c ₂₂ +c ₁₁−√{square root over ((c ₁₁ −c₂₂)²+4|c _(r12)|²)}) . with C_(r12)=real(c₁₂) denoting the real part ofc₁₂; calculate from C({circumflex over (t)}, k) estimationsP_(N)({circumflex over (t)}, k) of ambient power P_(N)({circumflex over(t)}, k)=λ₂({circumflex over (t)}, k), estimations P_(s)({circumflexover (t)}, k) of directional power P_(s)({circumflex over (t)},k)=λ₁({circumflex over (t)}, k)−P_(N)({circumflex over (t)}, k),elements of a gain vector a({circumflex over (t)}, k)=[a₁({circumflexover (t)}, k), a₂({circumflex over (t)}, k)]^(T) that mixes thedirectional components into x({circumflex over (t)}, k) and which aredetermined by:${{a_{1}\left( {\hat{t},k} \right)} = \frac{1}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},{{a_{2}\left( {\hat{t},k} \right)} = \frac{A\left( {\hat{t},k} \right)}{\sqrt{1 + {A\left( {\hat{t},k} \right)}^{2}}}},{{{{with}\mspace{14mu} {A\left( {\hat{t},k} \right)}} = \frac{{\lambda_{s}\left( {\hat{t},k} \right)} - c_{11}}{c_{r\; 12}}};}$calculate an azimuth angle of virtual source direction s({circumflexover (t)}, k to be extracted by${{\phi_{s}\left( {\hat{t},k} \right)} = {\left( {{{atan}\left( \frac{1}{A\left( {\hat{t},k} \right)} \right)} - \frac{\pi}{4}} \right)\frac{\phi_{x}}{\left( {\pi/4} \right)}}},$with φ_(x) giving the loudspeaker position azimuth angle related tosignal x₁ in radian, thereby assuming that −φ_(x) is the positionrelated to x₂; for each T/F tile ({circumflex over (t)}, k), extractinga first directional intermediate signal by ŝ:=g^(T)x with${g = \begin{bmatrix}\frac{a_{1}P_{s}}{P_{s} + P_{N}} \\\frac{a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}};$ scale said first directional intermediate signal inorder to derive a corresponding directional signal${s = {\sqrt{\frac{P_{s}}{{\left( {{g_{1}a_{1}} + {g_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {g_{1}^{2} + g_{2}^{2}} \right)P_{N}}}}\hat{s}}};$derive the elements of the ambient signal n=[n₁, n₂]^(T) by firstcalculating intermediate values${\hat{n}}_{1} = {{h^{T}x\mspace{14mu} {with}\mspace{14mu} h} = {\begin{bmatrix}\frac{{a_{2}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}} \\\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}}\end{bmatrix}\mspace{14mu} {and}}}$${{\hat{n}}_{2} = {{w^{T}x\mspace{14mu} {with}\mspace{14mu} w} = \begin{bmatrix}\frac{{- a_{1}}a_{2}P_{s}}{P_{s} + P_{N}} \\\frac{{a_{1}^{2}P_{s}} + P_{N}}{P_{s} + P_{N}}\end{bmatrix}}},$ followed by scaling of these values:${n_{1} = {\sqrt{\frac{P_{N}}{{\left( {{h_{1}a_{1}} + {h_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {h_{1}^{2} + h_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{1}}},{{n_{2} = {\sqrt{\frac{P_{N}}{{\left( {{w_{1}a_{1}} + {w_{2}a_{2}}} \right)^{2}P_{s}} + {\left( {w_{1}^{2} + w_{2}^{2}} \right)P_{N}}}}{\hat{n}}_{2}}};}$calculate for said directional components a new source directionϕ_(s)({circumflex over (t)}, k) by ϕ_(s)({circumflex over (t)}, k)=

φ_(s)({circumflex over (t)}, k), with stage_width

; if |ϕ_(s)({circumflex over (t)}, k)| is smaller than acenter_channel_capture_width value, setting o_(c)({circumflex over (t)},k)=s({circumflex over (t)}, k) and b_(s)({circumflex over (t)}, k)=0,else setting o_(c)({circumflex over (t)}, k)=0 and b_(s)({circumflexover (t)}, k)=y_(s)({circumflex over (t)}, k)s({circumflex over (t)},k), whereby y_(s)({circumflex over (t)}, k) is a spherical harmonicencoding vector derived from {circumflex over (φ)}_(s)({circumflex over(t)}, k) and a direct_sound_encoding_elevation θ_(S), y_(s)({circumflexover (t)}, k)=[Y₀ ⁰(θ_(S), ϕ_(s)), Y₁ ⁻¹(θ_(S), ϕ_(s)), . . . , Y_(N)^(N)(θ_(S), ϕ_(s))]^(T).
 17. A non-transitory computer readable storagemedium containing instructions that when executed by a processor performthe method of claim 1.